Introduction to geopandas and cartopy
Contents
Introduction to geopandas and cartopy#
Basic Setup#
Again we will be using pandas and matplotlib.
import pandas as pd
import matplotlib.pyplot as plt
We’ll also supress a few disturbing warnings.
import warnings
warnings.filterwarnings('ignore')
Why do we need something other than pandas?#
Let’s reload again our example dataset of conventional power plants in Europe as a pd.DataFrame.
fn = "https://raw.githubusercontent.com/PyPSA/powerplantmatching/master/powerplants.csv"
ppl = pd.read_csv(fn, index_col=0)
This dataset includes coordinates (latitude and longitude), which allows us to plot the location and capacity of all power plants in a scatter plot:
ppl.plot.scatter('lon', 'lat', s=ppl.Capacity/1e3)
<AxesSubplot:xlabel='lon', ylabel='lat'>
However, this graphs misses some geographic reference point, we’d normally expect for a map like shorelines, country borders etc.
Geopandas - a Pandas extension for geospatial data#

Geopandas extends pandas by adding support for geospatial data.
The core data structure in GeoPandas is the geopandas.GeoDataFrame, a subclass of pandas.DataFrame, that can store geometry columns and perform spatial operations.
Note
Documentation for this package is available at https://geopandas.org/en/stable/.
Typical geometries are points, lines, and polygons. They come from another library called shapely.
First, we need to import the geopandas package. The conventional alias is gpd:
import geopandas as gpd
We can convert the latitude and longitude values given in the dataset to formal geometries (to be exact: shapely.Point objects but we won’t go into detail regarding this) using the gpd.points_from_xy() function, and use this to gpd.GeoDataFrame. We should also specify a so-called coordinate reference system (CRS). The code ‘4326’ means latitude and longitude values.
geometry = gpd.points_from_xy(ppl['lon'], ppl['lat'])
gdf = gpd.GeoDataFrame(ppl, geometry=geometry, crs=4326)
Now, the gdf looks like this:
gdf.head(3)
| Name | Fueltype | Technology | Set | Country | Capacity | Efficiency | DateIn | DateRetrofit | DateOut | lat | lon | Duration | Volume_Mm3 | DamHeight_m | StorageCapacity_MWh | EIC | projectID | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | |||||||||||||||||||
| 0 | Doel | Nuclear | Steam Turbine | PP | Belgium | 2911.0 | NaN | 1975.0 | NaN | 2022.0 | 51.32481 | 4.25889 | NaN | 0.0 | 0.0 | 0.0 | {'22WDOELX3000078D', '22WDOELX2000077N', '22WD... | {'ENTSOE': {'22WDOELX3000078D', '22WDOELX20000... | POINT (4.25889 51.32481) |
| 1 | Sarrans | Hydro | Reservoir | Store | France | 183.0 | NaN | 1932.0 | NaN | NaN | 44.82942 | 2.74042 | NaN | 0.0 | 0.0 | 0.0 | {'17W100P100P02934'} | {'ENTSOE': {'17W100P100P02934'}, 'OPSD': {'OEU... | POINT (2.74042 44.82942) |
| 2 | Pragneres | Hydro | Reservoir | Store | France | 189.2 | NaN | 1953.0 | NaN | NaN | 42.82110 | 0.01033 | NaN | 0.0 | 0.0 | 0.0 | {'17W100P100P02918'} | {'ENTSOE': {'17W100P100P02918'}, 'OPSD': {'OEU... | POINT (0.01033 42.82110) |
With the additional geometry columns, it is now even easier to plot the geographic data:
gdf.plot(
column='Fueltype',
markersize=gdf.Capacity/1e2,
)
<AxesSubplot:>
We can also start up an interactive map to explore the geodata in more detail:
gdf.explore(column='Fueltype')
Map Projections with Cartopy#

Cartopy is a Python package designed for geospatial data processing and has exposed an interface to enable easy map creation using matplotlib.
The Earth is a globe, but we present maps usually on two-dimensional surfaces. Hence, we typically need to project data points onto flat surfaces (e.g. screens, paper). However, we will always loose some information in doing so.
A map projection is:
a systematic transformation of the latitudes and longitudes of locations from the surface of a sphere or an ellipsoid into locations on a plane. Wikipedia: Map projection.
Different projections preserve different metric properties. As a result, converting geodata from one projection to another is a common exercise in geographic data science.
conformal projections preserve angles/directions (e.g. Mercator projection)
equal-area projections preserve area measure (e.g. Mollweide)
equidistant projections preserve distances between points (e.g. Plate carrée)
compromise projections seek to strike a balance between distortions (e.g. Robinson)
If you like the “Orange-as-Earth” analogy for projections, checkout this numberphile video by Hannah Fry.
Note
Documentation for this package is available at https://scitools.org.uk/cartopy/docs/latest/.
First, we need to import the relevant parts of the cartopy package:
import cartopy
import cartopy.crs as ccrs
Let’s draw a first map with cartopy outlining the global coastlines in the so-called plate carrée projection (equirectangular projection):
ax = plt.axes(projection=ccrs.PlateCarree())
ax.coastlines()
<cartopy.mpl.feature_artist.FeatureArtist at 0x7fe42f8afca0>
A list of the available projections can be found on the Cartopy projection list page.
ax = plt.axes(projection=ccrs.Mollweide())
ax.stock_img()
<matplotlib.image.AxesImage at 0x7fe42f7d14f0>
We can combine the functionality of cartopy with geopandas plots:
fig = plt.figure(figsize=(7,7))
ax = plt.axes(projection=ccrs.PlateCarree())
gdf.plot(
ax=ax,
column='Fueltype',
markersize=gdf.Capacity/1e2,
)
<GeoAxesSubplot:>
We can add further geographic features to this map for better orientation.
For instance, we can add the coastlines…
ax.coastlines()
fig
… country borders …
ax.add_feature(cartopy.feature.BORDERS, color='grey', linewidth=0.5)
fig
… colour in the ocean in blue …
ax.add_feature(cartopy.feature.OCEAN, color='azure')
fig
…and color in the land area in yellow …
ax.add_feature(cartopy.feature.LAND, color='cornsilk')
fig
Geopandas will automatically calculate sensible bounds for the plot given the geographic data.
But we can also manually zoom in or out by setting the spatial extent with the .set_extent() method:
ax.set_extent([5, 16, 47, 55])
fig
Reprojecting a GeoDataFrame#
In geopandas, we can use the function .to_crs() to convert a GeoDataFrame to a desired coordinate reference system. In this particular case, we use the proj4_init string of an initialised cartopy projection to reproject our power plant GeoDataFrame.
fig = plt.figure(figsize=(7,7))
crs = ccrs.AlbersEqualArea()
ax = plt.axes(projection=crs)
gdf.to_crs(crs.proj4_init).plot(
ax=ax,
column='Fueltype',
markersize=gdf.Capacity/1e2,
)
ax.coastlines()
<cartopy.mpl.feature_artist.FeatureArtist at 0x7fe42f14f7f0>
Reading and Writing Files with geopandas#
In the following example, we’ll load a dataset containing the NUTS regions:
Nomenclature of Territorial Units for Statistics or NUTS (French: Nomenclature des unités territoriales statistiques) is a geocode standard for referencing the subdivisions of countries for statistical purposes.
Our ultimate goal for this part of the tutorial is to map the power plant capacities to the NUTS-1 region they belong to.
Common filetypes for vector-based geospatial datasets are GeoPackage (.gpkg), GeoJSON (.geojson), File Geodatabase (.gdb), or Shapefiles (.shp).
In geopandas we can use the gpd.read_file() function to read such files. So let’s start:
nuts = gpd.read_file("../../data/nuts/NUTS_RG_10M_2021_4326.geojson")
---------------------------------------------------------------------------
CPLE_OpenFailedError Traceback (most recent call last)
File fiona/_shim.pyx:83, in fiona._shim.gdal_open_vector()
File fiona/_err.pyx:291, in fiona._err.exc_wrap_pointer()
CPLE_OpenFailedError: ../../data/nuts/NUTS_RG_10M_2021_4326.geojson: No such file or directory
During handling of the above exception, another exception occurred:
DriverError Traceback (most recent call last)
Cell In [20], line 1
----> 1 nuts = gpd.read_file("../../data/nuts/NUTS_RG_10M_2021_4326.geojson")
File ~/micromamba-root/envs/esm/lib/python3.9/site-packages/geopandas/io/file.py:259, in _read_file(filename, bbox, mask, rows, engine, **kwargs)
256 path_or_bytes = filename
258 if engine == "fiona":
--> 259 return _read_file_fiona(
260 path_or_bytes, from_bytes, bbox=bbox, mask=mask, rows=rows, **kwargs
261 )
262 elif engine == "pyogrio":
263 return _read_file_pyogrio(
264 path_or_bytes, bbox=bbox, mask=mask, rows=rows, **kwargs
265 )
File ~/micromamba-root/envs/esm/lib/python3.9/site-packages/geopandas/io/file.py:303, in _read_file_fiona(path_or_bytes, from_bytes, bbox, mask, rows, where, **kwargs)
300 reader = fiona.open
302 with fiona_env():
--> 303 with reader(path_or_bytes, **kwargs) as features:
304 crs = features.crs_wkt
305 # attempt to get EPSG code
File ~/micromamba-root/envs/esm/lib/python3.9/site-packages/fiona/env.py:408, in ensure_env_with_credentials.<locals>.wrapper(*args, **kwargs)
405 @wraps(f)
406 def wrapper(*args, **kwargs):
407 if local._env:
--> 408 return f(*args, **kwargs)
409 else:
410 if isinstance(args[0], str):
File ~/micromamba-root/envs/esm/lib/python3.9/site-packages/fiona/__init__.py:256, in open(fp, mode, driver, schema, crs, encoding, layer, vfs, enabled_drivers, crs_wkt, **kwargs)
253 path = parse_path(fp)
255 if mode in ('a', 'r'):
--> 256 c = Collection(path, mode, driver=driver, encoding=encoding,
257 layer=layer, enabled_drivers=enabled_drivers, **kwargs)
258 elif mode == 'w':
259 if schema:
260 # Make an ordered dict of schema properties.
File ~/micromamba-root/envs/esm/lib/python3.9/site-packages/fiona/collection.py:162, in Collection.__init__(self, path, mode, driver, schema, crs, encoding, layer, vsi, archive, enabled_drivers, crs_wkt, ignore_fields, ignore_geometry, **kwargs)
160 if self.mode == 'r':
161 self.session = Session()
--> 162 self.session.start(self, **kwargs)
163 elif self.mode in ('a', 'w'):
164 self.session = WritingSession()
File fiona/ogrext.pyx:540, in fiona.ogrext.Session.start()
File fiona/_shim.pyx:90, in fiona._shim.gdal_open_vector()
DriverError: ../../data/nuts/NUTS_RG_10M_2021_4326.geojson: No such file or directory
nuts.head(3)
| id | NUTS_ID | LEVL_CODE | CNTR_CODE | NAME_LATN | NUTS_NAME | MOUNT_TYPE | URBN_TYPE | COAST_TYPE | FID | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | BG423 | BG423 | 3 | BG | Pazardzhik | Пазарджик | 3.0 | 2 | 3 | BG423 | POLYGON ((24.42101 42.55306, 24.41032 42.46950... |
| 1 | BG424 | BG424 | 3 | BG | Smolyan | Смолян | 3.0 | 3 | 3 | BG424 | POLYGON ((25.07422 41.79348, 25.05851 41.75177... |
| 2 | BG425 | BG425 | 3 | BG | Kardzhali | Кърджали | 3.0 | 3 | 3 | BG425 | POLYGON ((25.94863 41.32034, 25.90644 41.30757... |
It is good practice to set an index. You can use .set_index() for that:
nuts = nuts.set_index('id')
We can also check out the geometries in the dataset with .geometry:
nuts.geometry
id
BG423 POLYGON ((24.42101 42.55306, 24.41032 42.46950...
BG424 POLYGON ((25.07422 41.79348, 25.05851 41.75177...
BG425 POLYGON ((25.94863 41.32034, 25.90644 41.30757...
CH011 MULTIPOLYGON (((6.86623 46.90929, 6.89621 46.9...
CH012 POLYGON ((8.47767 46.52760, 8.39953 46.48872, ...
...
LV POLYGON ((27.35158 57.51824, 27.54521 57.53444...
ME POLYGON ((20.06394 43.00682, 20.32958 42.91149...
MK POLYGON ((22.36021 42.31116, 22.51041 42.15516...
SK0 POLYGON ((19.88393 49.20418, 19.96275 49.23031...
IT MULTIPOLYGON (((12.47792 46.67984, 12.69064 46...
Name: geometry, Length: 2010, dtype: geometry
With .crs we can check in which coordinate reference system the data is given:
nuts.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
nuts.total_bounds
array([-63.08825, -21.38917, 55.83616, 80.76427])
Let’s filter by NUTS-1 level…
nuts1 = nuts.query("LEVL_CODE == 1")
… and explore what kind of geometries we have in the dataset …
nuts1.explore()
To write a GeoDataFrame back to file use GeoDataFrame.to_file(). The file format is inferred from the file ending.
nuts1.to_file("NUTS1.geojson")
Calculating the areas and buffers#
The first thing we need to do to calculate area or buffers is to reproject the GeoDataFrame to an equal-area projection (here: EPSG:3035):
nuts1 = nuts1.to_crs(3035)
The area can be accessed via .area and is given in m² (after projection). Let’s convert to km²:
area = nuts1.area / 1e6
area
id
AT1 23545.286205
AT2 25894.953057
EL4 17388.679384
EE0 45315.713593
EL3 3799.676547
...
PL7 29846.398582
PL8 63217.536546
PL9 35563.812826
RO2 72545.290405
SK0 49008.115415
Length: 125, dtype: float64
nuts1.explore(column=area, vmax=1e5)
We can also build a buffer of 1km around each geometry using .buffer():
nuts1.buffer(1000).explore()
Joining spatial datasets#
Multiple GeoDataFrames can be combined via spatial joins.
Observations from two datasets are combined with the .sjoin() function based on their spatial relationship to one another (e.g. whether they are intersecting or overlapping). You can read more about the specific options here.
Let’s first reproject the gdf object to the same CRS as nuts1:
gdf = gdf.to_crs(3035)
Then, let’s have a look at both datasets at once. We want to find out which points (representing power plants) lie within which shape (representing NUTS regions).
fig = plt.figure(figsize=(7,7))
ax = plt.axes(projection=ccrs.epsg(3035))
nuts1.plot(
ax=ax,
edgecolor='black',
facecolor='lightgrey'
)
gdf.to_crs(3035).plot(
ax=ax,
column='Fueltype',
markersize=gdf.Capacity/20,
legend=True
)
ax.set_extent([5, 19, 47, 55])
We can now apply the .sjoin function to look for which power plants lie within which NUTS1 region. By default, sjoin looks for intersections and keeps the geometries of the left GeoDataFrame.
joined = gdf.sjoin(nuts1)
If we look at this new GeoDataFrame, we now have additional columns from the NUTS1 data:
joined.head(3)
| Name | Fueltype | Technology | Set | Country | Capacity | Efficiency | DateIn | DateRetrofit | DateOut | ... | index_right | NUTS_ID | LEVL_CODE | CNTR_CODE | NAME_LATN | NUTS_NAME | MOUNT_TYPE | URBN_TYPE | COAST_TYPE | FID | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | |||||||||||||||||||||
| 0 | Doel | Nuclear | Steam Turbine | PP | Belgium | 2911.0 | NaN | 1975.0 | NaN | 2022.0 | ... | BE2 | BE2 | 1 | BE | Vlaams Gewest | Vlaams Gewest | 0.0 | 0 | 0 | BE2 |
| 170 | Drogenbos Tgv | Natural Gas | CCGT | PP | Belgium | 465.0 | NaN | NaN | NaN | NaN | ... | BE2 | BE2 | 1 | BE | Vlaams Gewest | Vlaams Gewest | 0.0 | 0 | 0 | BE2 |
| 172 | Rodenhuize | Bioenergy | Steam Turbine | PP | Belgium | 268.0 | NaN | NaN | NaN | NaN | ... | BE2 | BE2 | 1 | BE | Vlaams Gewest | Vlaams Gewest | 0.0 | 0 | 0 | BE2 |
3 rows × 29 columns
We can now use these new columns to group the capacities (and convert to a suitable unit):
cap = joined.groupby("NUTS_ID").Capacity.sum() / 1000 # GW
Let’s quickly check if all NUTS1 regions have power plants:
nuts1.index.difference(cap.index)
Index(['CY0', 'ES7', 'FI2', 'FRY', 'IS0', 'LI0', 'MK0', 'MT0', 'PT2', 'PT3',
'TR1', 'TR2', 'TR3', 'TR4', 'TR5', 'TR6', 'TR7', 'TR8', 'TR9', 'TRA',
'TRB', 'TRC'],
dtype='object')
This is not the case. Then it is good practice to reindex the series to include all NUTS1 regions, even if this leads to some NaN values.
cap = cap.reindex(nuts1.index)
cap
id
AT1 4.382100
AT2 4.089500
EL4 0.427967
EE0 1.369000
EL3 0.022142
...
PL7 6.379902
PL8 0.685651
PL9 5.230788
RO2 2.036566
SK0 5.567220
Name: Capacity, Length: 125, dtype: float64
Finally, we can plot the total generation capacity per NUTS1 region on a map.
nuts1.plot(figsize=(7,7), column=cap, legend=True)
<AxesSubplot:>
This concludes the geopandas and cartopy tutorial.